Skip to content

[Do Not Merge] Avoid subscriber re-fire when storage writes retry#792

Open
elirangoshen wants to merge 2 commits into
Expensify:mainfrom
callstack-internal:elirangoshen/fix/retry-side-effects-idempotency
Open

[Do Not Merge] Avoid subscriber re-fire when storage writes retry#792
elirangoshen wants to merge 2 commits into
Expensify:mainfrom
callstack-internal:elirangoshen/fix/retry-side-effects-idempotency

Conversation

@elirangoshen
Copy link
Copy Markdown
Contributor

@elirangoshen elirangoshen commented May 25, 2026

Details

Follow-up to #787. After the cache-first/storage-second refactor of mergeCollectionWithPatches, the storage-retry path was identified as re-notifying subscribers once per retry attempt — waitForCollectionCallback subscribers fire on every keysChanged call by contract, so a transient storage error could surface 2-6 callback invocations for a single logical write. Same shape applies to multiSetWithRetry, setCollectionWithRetry, and partialSetCollection — all four flow through retryOperation, which re-enters the orchestrator on retry.

setWithRetry was already safe because broadcastUpdate short-circuits on hasChanged === false.

Fix shape

Two small pieces:

  1. !retryAttempt guard around the subscriber notification in each of the four functions. cache.set/merge stays unguarded — it's idempotent, runs every attempt, and is what restores cache state after retryOperation's eviction branch drops a key. The notification (keysChanged/keyChanged) is the one piece that's not safe to re-fire.

  2. skipNotify flag on remove() + passing true from retryOperation's eviction branch. Without this, eviction-during-retry would fire keyChanged(undefined) and strand subscribers in the "removed" state, since the orchestrator no longer re-fires keysChanged on retry. cache.drop and Storage.removeItem still run — only the spurious keyChanged(undefined) is skipped, and the imminent retry's cache.set covers cache restoration.

That's the entire diff: +157 / -9 across lib/OnyxUtils.ts and tests/unit/onyxUtilsTest.ts.

Related Issues

No separate GH issue — this PR addresses #787 review comment directly.

Linked E/App PR

Expensify/App#91626

Automated Tests

Added a new retry side-effect idempotency describe block in tests/unit/onyxUtilsTest.ts with one test per affected function. Each seeds a waitForCollectionCallback: true subscriber, mocks Storage.multiSet (or Storage.multiMerge for mergeCollection) to reject once then resolve, runs the write, and asserts the subscriber was called exactly 1 time across the retry chain:

  1. mergeCollection — waitForCollectionCallback subscriber fires once across retries
  2. Onyx.multiSet — collection subscriber fires once across retries
  3. Onyx.setCollection — collection subscriber fires once across retries
  4. OnyxUtils.partialSetCollection — collection subscriber fires once across retries

All 4 fail with Received number of calls: 2 if the !retryAttempt guard is reverted in any of the four orchestrators — confirmed locally via git stash of lib/OnyxUtils.ts.

All 6 existing storage eviction tests still pass without modification, validating that the skipNotify change to remove() preserves the eviction recovery path: the imminent retry's cache.set re-populates the cache for any evicted in-flight key, and subscribers never see a transient undefined because keyChanged(undefined) no longer fires from the eviction branch.

Local results:

  • npx jest455 / 455 pass across 17 suites.
  • npx tsc --noEmit — clean.

Manual Tests

The four refactored paths (mergeCollectionWithPatches, multiSetWithRetry, setCollectionWithRetry, partialSetCollection) are reached from the App via Pusher events, the OpenApp response, every Onyx.update batch that contains a MERGE_COLLECTION / MULTI_SET / SET_COLLECTION op, LHN refresh, Search filters, chat sends, mark-all-as-read, hold/unhold, workspace switching, etc. The companion App PR Expensify/App#91626 pins react-native-onyx to this branch's head for end-to-end exercise; the steps below are the same protocol used for #787.

Setup

  1. In the App repo, check out Expensify/App#91626 (elirangoshen/fix/retry-side-effects-idempotency on callstack-internal/Expensify-App), which bumps package.json's react-native-onyx to this branch's head.
  2. npm install under Node 20.20.0, then npm run web.
  3. Open https://dev.new.expensify.com:8082/ in Chrome with DevTools open.
  4. Sign in to a test account.

Functional smoke — no regression in the refactored paths

The refactor preserves the cache-first invariant from #787, so all of these should look indistinguishable from main to the user:

  1. Initial hydration — after login, LHN reports list and workspace switcher populate within a few seconds. No missing rows vs. a baseline session on main. (exercises mergeCollectionWithPatches via Pusher / OpenApp)
  2. Chat message — open a chat, send a message. Appears immediately (optimistic merge into reportActions_), confirms via Pusher, persists after reload. (mergeCollectionWithPatches)
  3. Mark all as read — click the mark-all-as-read action. All unread badges clear immediately and stay cleared after reload. (mergeCollectionWithPatches)
  4. Search & filter — open Search, apply a filter. Results populate within a few seconds; live updates as filters change; no duplicates. (mergeCollectionWithPatches + setCollectionWithRetry)
  5. Hold / unhold an expense — toggle hold on an expense in a report. Badge appears/clears immediately; persists after reload. (mergeCollectionWithPatches)
  6. Submit expense — FAB → Submit expense. Expense appears in the report immediately, no duplicate, persists after reload. (mergeCollectionWithPatches)
  7. Switch workspaces — switch via the workspace switcher. LHN reports filter to the new workspace within a few seconds. (mergeCollectionWithPatches)

Storage-failure simulation — core regression guard for this PR

This is the test that exercises the retry path the fix targets. With the fix in place, retries on storage failure should produce one subscriber notification per logical operation (not one per retry attempt), and brand-new keys should stay routed through multiSet even when an earlier multiMerge retry kicked in.

  1. With the App running and authenticated, open Chrome DevTools → Application tab → IndexedDB.
  2. Find the database used by Onyx (typically named OnyxDB).
  3. Right-click → Delete database while the App is still running and connected (do not reload).
  4. Immediately trigger a mergeCollection-driven action — switch workspaces, send a chat message, or apply a search filter.
  5. Expected with this fix: the UI updates correctly; the new state is visible to subscribers even though the underlying IDB write fails and retryOperation kicks in. Console shows storage error logs and retry attempts (look for Failed to save to storage. Error: ... retryAttempt: N/5 in Logger.logInfo), but no white screen, no stale UI, no data loss within the session, and no duplicate waitForCollectionCallback subscriber invocations (verify via React DevTools "Highlight updates when components render" or via a console.count instrumented in a temporary useOnyx callback if you want to be precise).
  6. Try the same with a multiSet-driven action (least common — typically a deep-link prep) and a setCollection-driven action (Search filter result population) to cover the other refactored paths.
  7. (Optional regression check) Revert this PR's commits locally and re-run step 4. Without the fix, you'll either see (a) Onyx.connect({waitForCollectionCallback: true}) callbacks firing twice in DevTools traces, or (b) on the routing bug specifically, no observable user-facing difference on IDB but the issue is fixed under the hood for non-IDB backends.

Cold-cache merge — preserves the pre-warm fast path from #787

  1. Sign out and clear site data to force a cold cache.
  2. Sign back in. LHN should populate within the same window as a baseline main session — the cache pre-warm path (existing in mergeCollectionWithPatches) is untouched by this PR.

Author Checklist

  • I linked the correct issue in the ### Related Issues section above
  • I linked the corresponding Expensify/App PR in the ### Linked E/App PR section above, and verified this change against it (E/App CI passed and manual testing completed) (companion PR [No QA] Pin react-native-onyx to PR #792 head for retry-idempotency end-to-end testing App#91626 pins this branch's HEAD for end-to-end exercise)
  • I wrote clear testing steps that cover the changes made in this PR
    • I added steps for local testing in the Tests section
    • I tested this PR with a High Traffic account against the staging or production API to ensure there are no regressions (e.g. long loading states that impact usability). (pending companion App PR #91626 manual run)
  • I included screenshots or videos for tests on all platforms (pending companion App PR #91626; this PR has no UI surface)
  • I ran the tests on all platforms & verified they passed on:
    • Android / native
    • Android / Chrome
    • iOS / native
    • iOS / Safari
    • MacOS / Chrome / Safari
  • I verified there are no console errors (if there's a console error not related to the PR, report it or open an issue for it to be fixed)
  • I followed proper code patterns (see Reviewing the code)
    • I verified that any callback methods that were added or modified are named for what the method does and never what callback they handle (i.e. toggleReport and not onIconClick)
    • I verified that the left part of a conditional rendering a React component is a boolean and NOT a string, e.g. myBool && <MyComponent />.
    • I verified that comments were added to code that is not self explanatory
    • I verified that any new or modified comments were clear, correct English, and explained "why" the code was doing something instead of only explaining "what" the code was doing.
    • I verified proper file naming conventions were followed for any new files or renamed files. All non-platform specific files are named after what they export and are not named "index.js". All platform-specific files are named for the platform the code supports as outlined in the README.
    • I verified the JSDocs style guidelines (in STYLE.md) were followed
  • If a new code pattern is added I verified it was agreed to be used by multiple Expensify engineers (orchestrator/write-helper split was suggested in Make mergeCollectionWithPatches cache-first #787 review)
  • I followed the guidelines as stated in the Review Guidelines
  • I tested other components that can be impacted by my changes (i.e. if the PR modifies a shared library or component like Avatar, I verified the components using Avatar are working as expected) (459/459 unit tests green)
  • I verified all code is DRY (the PR doesn't include any logic written more than once, with the exception of tests) (persistCollectionWrite is shared between setCollectionWithRetry and partialSetCollection since their storage tails are byte-identical)
  • I verified any variables that can be defined as constants (ie. in CONST.js or at the top of the file that uses the constant) are defined as such
  • I verified that if a function's arguments changed that all usages have also been updated correctly
  • If a new component is created I verified that: (N/A — no new components)
    • A similar component doesn't exist in the codebase (N/A)
    • All props are defined accurately and each prop has a /** comment above it */ (N/A)
    • The file is named correctly (N/A)
    • The component has a clear name that is non-ambiguous and the purpose of the component can be inferred from the name alone (N/A)
    • The only data being stored in the state is data necessary for rendering and nothing else (N/A)
    • If we are not using the full Onyx data that we loaded, I've added the proper selector in order to ensure the component only re-renders when the data it is using changes (N/A)
    • For Class Components, any internal methods passed to components event handlers are bound to this properly so there are no scoping issues (i.e. for onClick={this.submit} the method this.submit should be bound to this in the constructor) (N/A)
    • Any internal methods bound to this are necessary to be bound (i.e. avoid this.submit = this.submit.bind(this); if this.submit is never passed to a component event handler like onClick) (N/A)
    • All JSX used for rendering exists in the render method (N/A)
    • The component has the minimum amount of code necessary for its purpose, and it is broken down into smaller components in order to separate concerns and functions (N/A)
  • If any new file was added I verified that: (N/A — no new files)
    • The file has a description of what it does and/or why is needed at the top of the file if the code is not self explanatory (N/A)
  • If the PR modifies a generic component, I tested and verified that those changes do not break usages of that component in the rest of the App (i.e. if a shared library or component like Avatar is modified, I verified that Avatar is working as expected in all cases) (pending companion App PR #91626)
  • If the main branch was merged into this PR after a review, I tested again and verified the outcome was still expected according to the Test steps. (N/A — no merge from main yet)
  • I have checked off every checkbox in the PR author checklist, including those that don't apply to this PR.

Screenshots/Videos

Android: Native
Android: mWeb Chrome
iOS: Native
iOS: mWeb Safari
MacOS: Chrome / Safari
Screen.Recording.2026-05-26.at.14.41.29.mov
Screen.Recording.2026-05-26.at.14.42.06.mov
Screen.Recording.2026-05-26.at.14.42.27.mov
Screen.Recording.2026-05-26.at.14.43.07.mov
Screen.Recording.2026-05-26.at.14.44.17.mov
Screen.Recording.2026-05-26.at.14.44.48.mov

@elirangoshen elirangoshen marked this pull request as ready for review May 26, 2026 12:50
@elirangoshen elirangoshen requested a review from a team as a code owner May 26, 2026 12:50
@melvin-bot melvin-bot Bot requested review from dangrous and removed request for a team May 26, 2026 12:50
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: de62ae5eae

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread lib/OnyxUtils.ts Outdated
const {keyValuePairsToStore, newData} = params;

return Storage.multiSet(keyValuePairsToStore)
.catch((error) => OnyxUtils.retryOperation(error, persistMultiSetWrite, params, retryAttempt))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Retry orchestrator after eviction to restore cache state

When Storage.multiSet fails with a capacity error, retryOperation() evicts an LRU key via remove(), which immediately drops that key from cache and notifies subscribers before retrying. Retrying persistMultiSetWrite (and similarly persistCollectionWrite / persistMergedCollectionWrite) only re-attempts the storage write, so if the evicted key is part of the in-flight write, its cache value is never restored and subscribers keep the stale “removed” state even though storage eventually succeeds. This is a regression from the previous behavior where retries re-ran the orchestrator and re-applied cache updates.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 11bc0a6.

Fix: each orchestrator (multiSetWithRetry, setCollectionWithRetry, partialSetCollection, mergeCollectionWithPatches) now constructs a restoreEvictedKey(key) closure that re-applies its cache write + subscriber notification for a single in-flight key. The closure is passed via the write helper's params; retryOperation invokes it after remove(keyForRemoval) and before re-entering the write helper. No-op when the evicted key isn't part of this write (the typical case).

mergeCollection nuance: the closure snapshots the cache's post-merge state (not just the delta) so previously-merged-in fields survive when the eviction's cache.drop forces us to re-populate cache from scratch — otherwise we'd lose fields like {id: 1} when restoring after a mergeCollection({key: {value: 'new'}}) retry.

Regression coverage: added 3 tests under storage eviction — one per write helper — that mock Storage.multiSet/Storage.multiMerge to fail with database or disk is full once on a setup where the LRU evictable key is the same key being written, and assert both cache + subscriber reflect the restored value (not undefined). All 3 fail without the lib changes (Received: undefined / missing fields) and pass with them.

459/459 tests pass; tsc and prettier clean.

elirangoshen added a commit to callstack-internal/react-native-onyx that referenced this pull request May 26, 2026
Addresses Codex P1 on Expensify#792 review:
Expensify#792 (comment)

When `retryOperation()` evicts an LRU key via `remove()` on a storage
capacity error, `remove()` immediately drops the key from cache and
fires `keyChanged(undefined)`. With the orchestrator/write-helper split
introduced in this PR, the retry only re-attempts the storage write --
it does not re-apply cache state. So if the evicted key is part of the
in-flight write, cache + subscribers stay in the "removed" state even
though the storage retry eventually succeeds, leaving the App's view
of the world inconsistent with persisted data.

Fix: each orchestrator now constructs a `restoreEvictedKey(key)`
closure that re-applies its cache write + subscriber notification for
a single in-flight key, and passes it via the write helper's params.
`retryOperation` invokes the closure after `remove(keyForRemoval)`,
before re-entering the write helper. The closure is a no-op when the
evicted key isn't part of this write (the typical case where some
unrelated evictable key gets selected).

For mergeCollection specifically, the closure snapshots the cache's
post-merge state for in-flight keys (rather than just the merge delta)
so that previously-merged-in fields are preserved when the eviction's
cache.drop forces us to re-populate cache from scratch.

Adds 3 regression tests under `storage eviction`:
- `multiSet` -- LRU evictable is the in-flight key (persistMultiSetWrite path)
- `setCollection` -- same scenario via persistCollectionWrite
- `mergeCollection` -- same scenario via persistMergedCollectionWrite,
  asserts the previously-existing `id` field survives the eviction

All 3 tests fail with `Received: undefined` / missing fields if the
lib changes are reverted; pass with them in place.

All 459 unit tests pass (456 existing + 3 new).
@fabioh8010
Copy link
Copy Markdown
Contributor

i will review it soon

@elirangoshen elirangoshen changed the title Make retried Onyx writes idempotent on storage retry [Hold] Make retried Onyx writes idempotent on storage retry May 27, 2026
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@elirangoshen elirangoshen force-pushed the elirangoshen/fix/retry-side-effects-idempotency branch from e70af5e to bd0b1e9 Compare May 28, 2026 13:46
@elirangoshen
Copy link
Copy Markdown
Contributor Author

@codex review

@elirangoshen elirangoshen changed the title [Hold] Make retried Onyx writes idempotent on storage retry [Hold] Avoid subscriber re-fire when storage writes retry May 28, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bd0b1e9819

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread lib/OnyxUtils.ts Outdated
@cristipaval cristipaval requested review from Julesssss and cristipaval and removed request for dangrous May 29, 2026 07:55
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@elirangoshen elirangoshen changed the title [Hold] Avoid subscriber re-fire when storage writes retry [Do Not Merge] Avoid subscriber re-fire when storage writes retry May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants